{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 代码实现"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们将开始动手实现一个C3D模型并在UCF101数据集上进行训练和测试。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-04-11T07:31:04.787604Z",
     "iopub.status.busy": "2023-04-11T07:31:04.787213Z",
     "iopub.status.idle": "2023-04-11T07:31:04.950726Z",
     "shell.execute_reply": "2023-04-11T07:31:04.949574Z",
     "shell.execute_reply.started": "2023-04-11T07:31:04.787563Z"
    }
   },
   "outputs": [],
   "source": [
    "import torch.nn as nn\n",
    "\n",
    "class C3D(nn.Module):\n",
    "    \"\"\"\n",
    "    C3D，Convolution 3D，即三维卷积\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(self):\n",
    "        super(C3D, self).__init__()\n",
    "        # 注意到核函数为三维，符合三维卷积的要求\n",
    "        self.conv1 = nn.Conv3d(3, 64, kernel_size=(3, 3, 3), padding=(1, 1, 1))\n",
    "        self.pool1 = nn.MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2))\n",
    "\n",
    "        self.conv2 = nn.Conv3d(64, 128, kernel_size=(3, 3, 3), padding=(1, 1, 1))\n",
    "        self.pool2 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))\n",
    "\n",
    "        self.conv3a = nn.Conv3d(128, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))\n",
    "        self.conv3b = nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))\n",
    "        self.pool3 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))\n",
    "\n",
    "        self.conv4a = nn.Conv3d(256, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))\n",
    "        self.conv4b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))\n",
    "        self.pool4 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))\n",
    "\n",
    "        self.conv5a = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))\n",
    "        self.conv5b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))\n",
    "        self.pool5 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2), padding=(0, 1, 1))\n",
    "\n",
    "        self.fc6 = nn.Linear(8192, 4096)\n",
    "        self.fc7 = nn.Linear(4096, 4096)\n",
    "        self.fc8 = nn.Linear(4096, 487)\n",
    "\n",
    "        self.dropout = nn.Dropout(p=0.5)\n",
    "\n",
    "        self.relu = nn.ReLU()\n",
    "        self.softmax = nn.Softmax()\n",
    "\n",
    "    def forward(self, x):\n",
    "\n",
    "        h = self.relu(self.conv1(x))\n",
    "        h = self.pool1(h)\n",
    "\n",
    "        h = self.relu(self.conv2(h))\n",
    "        h = self.pool2(h)\n",
    "\n",
    "        h = self.relu(self.conv3a(h))\n",
    "        h = self.relu(self.conv3b(h))\n",
    "        h = self.pool3(h)\n",
    "\n",
    "        h = self.relu(self.conv4a(h))\n",
    "        h = self.relu(self.conv4b(h))\n",
    "        h = self.pool4(h)\n",
    "\n",
    "        h = self.relu(self.conv5a(h))\n",
    "        h = self.relu(self.conv5b(h))\n",
    "        h = self.pool5(h)\n",
    "\n",
    "        h = h.view(-1, 8192)\n",
    "        h = self.relu(self.fc6(h))\n",
    "        h = self.dropout(h)\n",
    "        h = self.relu(self.fc7(h))\n",
    "        h = self.dropout(h)\n",
    "\n",
    "        logits = self.fc8(h)\n",
    "        probs = self.softmax(logits)\n",
    "\n",
    "        return probs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们将基于C3D在UCF101数据集中实现视频动作识别。这里我们将直接调用已经完成的代码，你可以在GitHub链接中查看详细的项目。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-04-11T07:31:04.953100Z",
     "iopub.status.busy": "2023-04-11T07:31:04.952673Z",
     "iopub.status.idle": "2023-04-11T07:31:05.229727Z",
     "shell.execute_reply": "2023-04-11T07:31:05.228024Z",
     "shell.execute_reply.started": "2023-04-11T07:31:04.953056Z"
    }
   },
   "outputs": [],
   "source": [
    "!git clone https://github.com/Niki173/C3D"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在配置好UCF101数据集和模型预训练的权重之后，我们便可以进行C3D的训练。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-04-11T07:31:05.231676Z",
     "iopub.status.busy": "2023-04-11T07:31:05.231267Z",
     "iopub.status.idle": "2023-04-11T07:31:05.497611Z",
     "shell.execute_reply": "2023-04-11T07:31:05.495635Z",
     "shell.execute_reply.started": "2023-04-11T07:31:05.231630Z"
    }
   },
   "outputs": [],
   "source": [
    "!python train.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里我们只展示部分训练流程。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "<center>\n",
    "    <img style=\"border-radius: 0.3125em;\" \n",
    "    src=\"https://pic4.zhimg.com/80/v2-bd336d8be7e7f738db8e2ab5ae73005f_1440w.jpg\" width=1000>\n",
    "    <br>\n",
    "    <div style=\"color:orange; \n",
    "    display: inline-block;\n",
    "    color: #999;\n",
    "    padding: 2px;\"></div>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最后，我们在测试集上进行测试，准确率可达 96\\%。\n",
    "\n",
    "\n",
    "<center>\n",
    "    <img style=\"border-radius: 0.3125em;\" \n",
    "    src=\"https://pic3.zhimg.com/80/v2-6ef00350bc6f67a587ff4aca92e6f16e_1440w.jpg\" width=1000>\n",
    "    <br>\n",
    "    <div style=\"color:orange; \n",
    "    display: inline-block;\n",
    "    color: #999;\n",
    "    padding: 2px;\"></div>\n",
    "</center>\n",
    "\n",
    "\n",
    "<center>\n",
    "    <img style=\"border-radius: 0.3125em;\" \n",
    "    src=\"https://pic4.zhimg.com/80/v2-70f46ed956b35461b3539053311877c7_1440w.jpg\" width=600>\n",
    "    <br>\n",
    "    <div style=\"color:orange; \n",
    "    display: inline-block;\n",
    "    color: #999;\n",
    "    padding: 2px;\"></div>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- 目前的困难：\n",
    "1. 姿态估计的精确度问题：由于人体姿态估计系统使用的是多种视觉信息，如果它们不能够准确地追踪到人体的每个部位，那么估计出来的结果也会不准确。\n",
    "1. 尺度不变性问题：尽管现有的人体姿态估计算法可以处理不同尺度的图像，但它们仍然存在尺度不变性问题，即当姿态发生变化时，估计的结果可能会受到影响。\n",
    "1. 光照变化问题：由于光照变化会对视觉信息产生影响，因此人体姿态估计系统可能无法准确地识别不同光照环境下的人体部位。\n",
    "1. 复杂背景问题：复杂的背景可能会干扰人体姿态估计系统的性能，使得它无法准确地识别人体部位。 -->"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
